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1.  Introduction 


There  are  many  numerical  procedures  for  calculating  the  maximum  likelihood 
estimates  for  loglinear  models  of  frequency  data.  The  most  popular  methods  are 
the  Iterative  Proportional  Fitting  Procedure  (IPFP)  and  variants  of  Newton’s 
method.  For  problems  involving  a  large  number  of  parameters  Newton’s  method  is 
often  impractical.  On  the  other  hand  many  models  can  not  be  expressed  in  a 
form  which  allows  the  simple  IPFP  to  be  applied.  In  these  circumstances  some 
other  nonlinear  optimization  technique  (e.g.  the  Generalized  Iterative  Scaling 
method  of  Darroch  and  Ratcliff  (1972)  or  the  extensions  of  the  IPFP  due  to 
Haberman  (1974))  must  be  used.  As  the  basic  IPFP  is  a  well  understood,  robust, 
and  widely  available  algorithm  it  would  often  be  desirable  to  cajole  a  given 
problem  into  a  form  where  the  IPFP  can  be  applied.  We  present  a  general 
theorem  on  transforming  contingency  tables  and  several  applications  where  the 
transformation  technique  has  allowed  us  to  take  advantage  of  the  IPFP  and 
resulted  in  simple  and  useful  procedures.  A  further  advantage  of  this  tech¬ 
nique  is  that  it  is  sometimes  possible  to  recognize  closed-form  estimates  in 
the  transformed  problem  while  they  would  be  overlooked  in  the  original  setting. 

We  shall  view  the  estimation  problem  as  one  of  minimizing  the  Kullback- 
Leibler  information  distance  between  two  probability  mass  functions  (p.m.f.’s) 
and  will  roughly  follow  the  notation  of  Csiszar  (1976).  Although  we  have 
adopted  the  information  distance  point  of  view,  the  duality  between  maximum 
likelihood  estimation  and  minimum  information  estimation  (see  e.g.  Darroch  and 


Ratcliff  (1972))  implies  that  the  results  of  this  paper  can  just  as  well  be 


interpreted  from  the  maximum  likelihood  point  of  view. 
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2.  Background  and  Notation 

Csiszar  (1976)  presents  a  very  elegant  discussion  of  the  IPFP  by  developing 
a  "geometry"  for  the  information  measure.  A  simplified  version  of  the  chief 
results  of  this  theory  are  outlined  below.  Let  n,  p,  q,  r,  s,  and  t  denote 
p.m.f.’s  which  are  non-zero  for  all  elements  of  a  finite  set  I.  The  Kullback- 
Leibler  information  number  (or  directed  divergence)  specifies  a  distance, 

1  (p ! I  q)  ■  -pry  E  p(i)  In  (p(i)/q(i)) 

11  iel 

between  p  and  q.  The  principle  of  minimum  discriminant  information,  as  form¬ 
ulated  by  Kullback  (1959),  aims  to  minimize  the  distance  between  a  reference 
distribution,  q  above,  and  a  family  of  other  distributions.  The  properties  of 
such  estimates  have  been  studied  extensively.  The  most  important  results  can 
be  found  in  Kullback  (1959)  and  are  summarized,  with  a  special  emphasis  on  con¬ 
tingency  tables,  in  Gokhale  and  Kullback  (1978). 

We  next  develop  an  appropriate  family, E  ,  of  p.m.f.’s.  A  convex  set , E  * 
of  p.m.f.’s  is  called  linear  if  when  p  and  q  are  in  E  and  t  *  a  •  p  +  (1-a)  •  q 
(a  £  ]R)  is  a  p.m.f.,  then  t  is  also  in  E .  A  p.m.f.  which  satisfies 

I(q  ||  r  )  =  min  I  (p||r  ) 
peE 

is  called  the  I-projection  of  r  on  E  and  will  be  denoted  by  q  ^IP^r). 

Csiszar  gives  conditions  under  which  JPg(r)  exists  (it  is  always  unique)  and 
develops  a  geometry  for  I-projections  by  using  an  analogue  of  Pythagorous* 
Theorem.  Now  let  F  -  (f  :  yeT}  be  a  set  of  real  valued  functions  on  I  and 
A  *  {a^:  yeD  be  real  constants.  Define  Mp  to  be  span  (F).  A  linear  set,E, 
can  be  constructed  by  considering  the  set  of  p  for  which, 

I  p(i)  *  f  (i)  *  a  ;  y zT 

iel  y  Y 


When  we  consider  s  to  be  an  observed  probability  function  and 


a  -  I  s(i)  f  (i)  :  yer 

'  iel  T 

then  the  duality  between  maximum  likelihood  and  minimum  discriminant 
estimation  states  that  if 

q 

then 

ln(q)  e  Hp  +  ln(r) 

and 

*  i 

q  -  s  £  Wp  # 

/V 

i.e.  q  is  the  m.l.e.  (under  Poisson  sampling)  for  the  corresponding  log- 
affine  model,  Csiszar's  principle  theorem  says  that  if  E  is  the  finite 

A 

intersection  of  the  linear  sets  E,  (i.e.  E  =  H  E  )  then  q  =Fr(r) 

keK  * 

is  the  pointwise  limit  of  q^  =  (q  )  n  =  1,2,3  where  *  r  and 

n 

E  =  E.  if  i  *  n  mod  I K 1 
n  i  11 


Example  1.  Ordered  Categories 

Let  p  be  an  observed  3x3  probability  function  obtained  via  multinomial 
sampling  and  consider  the  ordered  categories  model 


and  ln(q^)  ■  +  j*yi  +  i«<5j  ;  i,j  -  1,2,3  . 


The  linear  manifold  for  this  model  is  spanned  by  a  set  of 
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tables,  f*,  f*R,  f^,  and  f^;  i,j  *  1,2,3.  The  subscripts  R,  OR,  C  and 
OC  indicate  that  the  vector  corresponds  to  Row,  C)rdered  Row,  £olumn  or 
Ordered  Column  parts  of  the  model,  while  the  superscript  indicates  the 
row  or  column  number,  e.g.. 


The  general  structure  is  that  f*  (or  f^)  is  a  table  of  zeros  except 
for  the  i'th  row  (j^h  column)  which  contains  ones,  i.e., 


l) 


j  i  k  - 1 

{  0  k  4  i  • 


Similarly,  for  the  ordered  row  and  column  tables,  the  general  form  is 


(k,  Z) 


|  k-1  1  ==  j 

1  0  l  4  j 


We  now  group  the  spanning  tables  into  sets  of  related  constraints.  Let 
FR  -  {fR'  4  !  1  - 

and 

Pc  "  {fjC>  f0C  :  3  1,2f3^  ‘ 

The  sets  of  constants,  AR  and  A^,  are  determined  by  the  inner  products 
of  p  with  the  spanning  vectors. 
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The  linear  spaces  of  p.m.f.'s  corresponding  to  these  constraints  and 
constants  are: 

E_  *  {p.m.f.'s  p  s.t.  I  f jxk,£) *p(k,£)  *  af  ; 

R  k,l  A  A 

A  -  R,OR;  i  «  1,2,3} 

Ec-  {p.m.f.'s  p  s.t.  Z  fg(k,£) *p(k,£)  =  a^  ; 

k,£ 

B  =  C,OC;  j  *  1,2,3} 

In  order  to  find  the  M.L.E.’s  of  cell  probabilities  for  this  model  we 

A 

need  to  be  able  to  compute  q  =  P^Cr)  for  r(k,£)  =  l,Vk  ,  l  and  E  = 
EjH  Ej,  .  The  theory  tells  us  that  this  1-projection  can  be  obtained  by 
cyclically  projecting  onto  ER  and  Ec  . 

I 


3.  Motivation  for  Transformations 

As  algorithms  for  the  basic  IPFP  are  widely  available,  it  is  often 
advantageous  for  us  to  be  able  to  pose  a  problem  in  a  way  that  makes  it 
amenable  to  attack  by  means  of  these  programs. 

A  very  simple  example,  which  is  prototypical  of  those  that  will 
arise  in  our  later  discussion,  can  be  constructed  as  follows. 

Example  2 

Consider  a  triple  of  observed  counts  z  =  (z^,  z^)  from  3 

independent  Poisson  random  variables  with  mean  m  =  (m^ ,  m^)  and 

having  observed  values  (1,  3,  3).  Suppose  we  wish  to  fit  the  log- 
affine  model, 


It  is  a  simple  matter  to  verify  that  the  M.L.E.  is 

m  *  (.694,  3.611,  4.694)  .  Now  consider  the  related  contingency  table 


2 

3 

3 

10 

221 

CM 

N 

z2 

2z3 

and  the  model  for  the  mean,  m*  , 


Now  note  that 


2"l 

A 

m2 

A 

m2 

| - 

ro 

3  > 

-  In  other  words  it  is  possible  to  fit  the  "difficult"  model,  M  ,  by 

l  transforming  the  table  and  fitting  the  "easy"  model,  ^  *  ,  to  the 

4 

I 

5  transformed  table.  In  the -process  of  doing  this  transformation  we 

have  also  recognized  that  the  original  log-affine  model  actually  had 
closed-form  estimates,  namely 

=  (2z^  +  z2r  /  (4  x  +  z2  +  z3^ 

m2  =  ^2zl  +  z2^2z3  +  2 o'*  I  ^  x  +  z2  +  z3^ 

m3  =  ^2z3  +  z2^  ^  ^  x  ( zi  +  z2  +  z^) ) 


1.389 

3.611 

3.611 

9.389 

5><5/lS 

5X13/18 

5x13/18 

13x13/18 

This  example  is  clearly  contrived  to  please  Dr.  Pangloss.  We  shal] 
later  present  a  more  realistic  version  with  similar  consequences. 


! 


In  Che  preceding  example  we  transformed  the  data  into  a  form  where 
it  was  much  easier  to  compute  the  M.L.E.  of  the  vector  of  expected 
values.  Of  course  we  have  yet  to  prove  that  the  above  manipulation  is 
any  more  than  a  numerical  coincidence;  such  proofs  are  the  subject  of 
this  paper. 

The  idea  of  modifying  a  problem  so  that  it  is  amenable  to  analysis 

by  existing  or  easier  methods  is  not  at  all  new.  An  old  example  of 

this  phenomenon  is  the  method  of  filling  in  missing  values  to  transform 

an  "unbalanced"  analysis  of  variance  into  a  "balanced"  problem.  Although 

fitting  an  ANOVA  model  to  an  incomplete  data  array  is  conceptually  easy, 

the  calculations  are  much  simpler  when  the  missing  values  are  filled  in. 

The  same  is  true  of  Example  2.  Fitting  the  model  M  is  not  difficult  but 
* 

the  model  M  is  much  simpler. 

For  such  a  small  problem  as  Example  2  there  is  little  practical 
advantage  to  be  gained  from  the  transformation  technique.  The  motivation 
for  this  research  lies  in  some  very  large  problems  considered  by  Fienberg 
and  Wasserman  (1981).  We  discuss  their  examples  and  some  related  theory 
in  section  5. 

Thus  far  we  have  not  given  any  motivation  for  the  data  transformation 
of  Example  2.  We  now  continue  the  example  and  give  a  heuristic  justifi¬ 
cation  of  the  method  and  at  the  same  time  present  a  more  realistic  version 


of  this  problem. 
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Let  us  consider  a  general  log-affine  model  for  the  Poisson  data,  z, 
with  mean  value,  m,  namely 

ln(m)  £  ln(d)  +  M 

where  d  is  any  fixed  triple  of  positive  numbers  and  M  is  as  before. 
Note  that  if  d  is  the  vector  of  all  ones  then  this  reduces  to  a 
simple  log-linear  model.  Regardless  of  d  ,  a  version  of  the  suffi¬ 
cient  statistics  for  this  model  are 

V1  =  2Z1  +  Z2 

and 

v2  =  Z2  +  ^z3  * 

Now  consider  the  table  z*  as  a  transformation,  g  ,  of  z  ,  i.e. 


10 


*  *  it  it 

We  now  note  that  z.,  *  z -  v,  and  z~^  88  zl0*  v0  .  In  other 

1+  *r  l  1  2+  4*2  2 

words  the  sufficient  statistics  for  the  data  z  with  model  M  are 

represented  twice  in  the  margins  of  z*.  Thus  if  we  fit  the  row  and 

* 

column  margins  model,  M  ,  to  z*  we  might  expect  the  the  likelihood  equation 
for  model  M  is  also  satisfied.  This  turns  out  to  be  the  case,  but  we  have 
ignored  the  question  of  whether  m  satisfies  the  log-affine  model.  We  shall 
see  that  if  we  fit  the  log-affine  model 

(3.1.1)  In  (at*)  £  ln(g(d))  +  M* 

A 

to  the  data  z*  then  the  M.L.E.,  m,  can  be  recovered.  The  simple  IPFP, 
with  starting  table  g(d),  will  converge  to  the  M.L.E.  jg| 

In  section  4  we  discuss  what  conditions  are  necessary  to  justify 
procedures  such  as  those  discussed  above. 

4.  A  Transformation  Theorem 

We  present  a  collection  of  conditions  (graniloquently  labelled  as 
a  theorem)  relating  to  how  one  may  transform  estimation  problems.  First 
we  consider  a  very  weak  condition  which  will  be  used  in  the  theorem  and 
which  is  itself  sometimes  useful. 

The  idea  of  this  first  result  is  that  it  is  often  possible  to 
fortuitously  solve  a  difficult  estimation  problem  by  "accidentally" 
satisfying  the  conditions.  Consider  the  problem 

maximize  f(m|z) 

subject  to  m  £  V 

where  V  is  some  constraint  space.  Assume  f  has  a  unique  maximum  over  V 


II 


and  denote  the  maximizing  m  by  m.  Now  consider  the  problem 

maximize  f(m|z) 
subject  to  m  e  VT 

where  D  ,  Denote  the  maximizing  m  by  .  It  is  a  trivial 

observation  that  if_  m^*  e  V  then  «  m  .  In  other  words,  if  the  maximizing 

value,  m  ,  under  the  weaker  conditions,  V  ,  happens  to  satisfy  the  stronger 

conditions,  P,  then  m'  is  also  the  maximizer  under  the  stronger  conditions. 

Notice  also  that  we  did  not  require  m‘  to  be  unique  as  the  uniqueness  of 

~4* 

m  implies  there  is  at  most  one  m‘  in  P.  This  idea  could  be  used  anywhere 
a  constrained  maximum  is  required  but  there  is  no  guarantee  that  in^  will 
be  in  P.  We  will  use  this  general  idea  in  frequency  data  circumstances 
where  we  can  prove  that  m  will  be  in  V  and  where  the  constraints  V'  are 
easier  to  deal  with  than  the  constraints  V . 

We  now  turn  to  a  more  refined  version  of  this  method.  The  statement 
of  the  result  is  in  terms  of  the  Kullback-Leibler  distance  but  could 
equally  be  stated  in  terms  of  the  (dual)  likelihood  function. 

Theorem 

Let  g  be  a  one  to  one  mapping  of  the  p.m.f.'s  on  a  set  I  into  the 
p.m.f.’s  on  a  set  I*.  If  E  is  a  linear  set  of  p.m.f.’s  on  I,  then  define 
g(E)  m  {g(p):peE}  .  Let  E  be  a  linear  set  of  p.m.f.’s  on  I  such  that 
g(E) C  £*•  If  g  is  such  that 

(4.1)  I(p||  q)  ■  k  •  I(g(p)  II  g(q))  for  p,q  £  E  , 

(g(r))  £  g(E),  then 

*E(r>  -  g"1  0PE*  (g(r)))  | 


and  If 
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The  condition  (4*1)  could  be  generalized  to  allow  I(p||q)  - 
f(I(g(p)||  g(q)))  where  f  is  any  monotone  one  to  one  mapping.  We  have 
no  need  for  such  generality  here. 

The  theorem  shows  that  under  certain  conditions  it  is  possible  to 
calculate  an  I-projection  in  a  transformed  table  and  then  invert  the 
transformation  to  obtain  the  I-projection  in  the  original  setting. 

Verifying  the  conditions  of  the  theorem  may  itself  be  a  difficult  task. 
There  are  at  least  two  ways  of  using  the  theorem.  In  some  situations  it 
may  be  possible  to  define  the  linear  set  E*  so  that  g(E)  »  E*  .  This 
is  the  easier  case  and  it  essentially  just  relabels  the  problem.  However 
even  such  simple  relabeling  can  be  helpful  in  interpreting  the  model  or 
recognizing,  say,  a  model  in  the  transformed  space  for  which  closed  form 
estimates  are  known  to  exist.  The  second  application  of  the  theorem 
requires  more  work  to  verify  the  conditions,  but  is  also  more  generally 

ft 

applicable.  Here  we  take  a  linear  set  E  which  is  much  larger  than  g(E), 
but  we  then  need  to  prove  that  P£*(g(r))  e  g(E).  In  other  words,  even 
though  E*  contains  g(E)  we  need  to  show  that  for  any  g(r),  the  I-projection 
onto  E  is  always  an  element  of  g(E).  For  a  particular  set  of  data  it  may 
be  easy  to  verify  this  condition.  All  we  need  do  is  fit  the  transformed 
model  and  see  if  the  I-projection  is  in  g(E).  To  prove  this  type  of 
result  for  a  general  class  of  problems  is  more  difficult.  We  will 
illustrate  the  simple  case  of  the  theorem  with  the  following  examples. 
Section  5  will  be  devoted  to  a  discussion  of  a  set  of  examples  where 
g(E)  C  E*  . 
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Example  3 

This  example  is  a  continuation  of  Example  1.  The  problem  concerns 
a  3  x  3  table  where  the  classifying  variables  have  a  natural  ordering. 
The  specific  model  we  consider  fits  row  and  column  margins  and  linearly- 
weighted  row  and  column  margins. 

We  have  previously  shown  that  the  row  and  column  constraints  can 
be  considered  in  pairs  and  each  of  the  pairs  of  constraints  can  be 
individually  fit.  Thus  if  (w^.w^.w^)  are  the  current  fitted  values  for, 
say,  the  first  row,  we  need  to  adjust  this  triple  so  that  its  row  and 
ordered  row  margins  match  some  specified  constants. 

\ 

Let  Eg  be  the  set  of  positive  triples  which  satisfy  the  row  and 
ordered  row  constraints  for  the  first  row,  i.e., 

f  i  i  D 

E$  -  {positive  triples,  q  :  2q1  +  q2  -  2aR  -  aQR  *  a3 

and  q2  +  =  aJR  2 

Now  consider  the  function 


W1 

1 

-r-  wo 

2  2 

1 

2  W2 

w 

3 

and  define 


* 

E 


"  »<ES) 

■  {2x2  tables 


such  that  a  + 


b 


a  +  c 


1 

2  a3 


and 


d  +  c  *  d  4-  b 
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Note  Chat  the  constraints  on  E  imply  that  b  equals  c  which  means  that 
“1  * 

g  is  well  defined  on  E  .  It  is  not  a  difficult  calculation  to  verify 

that  I (q 1 1  w )  *  I(g(q)||g  (w)).  Our  theorem  now  allows  us  to  calculate 

IV  (w)  as  g"1  Fp*(g(w))  • 

CS  C 

The  constraints  which  define  E  are  just  simple  row  and  column 
margins.  Thus  the  I-projection,  TP^.CgCw))  ,  can  be  calculated  by  the 
usual  1PFP  (i.e.,  adjusting  row  and  column  margins),  or,  as  it  is  a 

2  x  2  table,  by  direct  calculation.  As  the  logarithms  of  the  starting 

values,  w  ,  do  not  necessarily  satisfy  the  model,  the  TPFP  will  in 

general  require  several  iterations  to  converge.  Thus  to  obtain  the 

I-projection,  IP  (q  )  ,  where  E  _  is  the  space  of  P.D.’s  which 
tR  n  R 

satisfy  all  of  the  row  constraints,  we  could  transform  each  row  of  the 

3  x  3  table  into  a  2  *  2  table,  calculate  with  the  2  *  2  tabic  and  then 
use  g  1  to  return  a  triple  of  fitted  values.  The  approach  for  the 
columns  would  be  similar. 

There  is  another  g  ,  which  transforms  the  entire  3  *  3  table  into 
a  2  x  2  x  2  x  2  table.  In  this  case  E*  =  g(  E  )  becomes  the  model  of 
no  fourth  order  interaction  for  the  2^  table.  Specifically, 


15 


It  is  not  difficult  to  check  that  the  model  of  no  fourth  order  interac¬ 
tion  corresponds  to  g(E)  and  that  I(p|]q)  =*  I(g(p)  ||  g  (9 ))  •  Therefore 
the  usual  IPFP,  with  starting  values  g(e)  and  the  model  of  no  fourth 

4 

order  interaction  applied  to  g(q^)  yield  a  2  table  of  fitted 

values  which  can  in  turn  be  transformed  (by  g  S  into  a  3  x  3  table  for 
the  original  problem.  £ 

Example  4.  Paired  Comparison  Models. 

Davidson  and  Beaver  (1977)  have  considered  a  generalization  of  the 
Bradley-Terry  model  for  paired  comparisons  which  allows  for  ties  and 
order  effects.  Fienberg  (1979)  demonstrated  that  the  models  of  David¬ 
son  and  Beaver  were  loglinear  models  and  showed  how  the  generalized 
iterative  scaling  method  of  Darroch  and  Ratcliff  0972)  can  be  used  for 
these  models.  We  show  how  the  simple  IPFP  can  also  be  used  to  do  the 
estimation. 

Consider  the  K  x  K  *  3  contingency  table  z  =  {z_^}  with  mean, 

m  *  (ra  }  •  The  loglinear  model  corresponding  to  the  Davi dson-Beaver 
ij  k 

model  is  (see  Fienberg  (1979)), 


InOn^i)  *  M  +  aij  +  $1  +  6i  , 


and 


ln(m.  j2)  =  y  +  +  e2  +  6.  , 


ln(m. j3)  =  y  +  a.j  +  B3  +  2(6.  +  5.)  , 


for  which  the  sufficient  statistics  are 

{zij  +  }  »  (Vrtc1  ’  and  {zi+l  +  Z+i2  +  2(zi+3  +  Z+i3)} 
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Thus  the 

likelihood  equations  are 

(4.2) 

A 

HI...  =  2.  . 

lj+  1J  + 

i j  1,2,.. 

.,  K 

(4.3) 

A 

m-H-k  =  z-H-k 

k  -  1,2,3 

and 

(4.4) 

“i+l  +  *+i2  +  I(“i+3  +  *+13* 

zi+l  +  z+i2  +  2(zi+3  + 

Z+i3)  i  =  1,2,... 

,  K  . 

Fienberg 

(1979,  p.  481)  writes  out  the 

Darroch  and  Ratcliff 

algorithm 

for  this  problem. 

We 

transform  z  into  the  K  x  K.  x 

4  table  z*  where 

(4.5) 

* 

Z  .  .  ,  ■  2  x  2  .  .  , 

xjl  ijl 

(4.6) 

* 

z .  .  0  *=  2  X  Z 

ij2 

(4.7) 

Zij  3  =  Zij  3 

(4.8) 

zij4  zij3 

i.j  =  1,2 . 

K  , 

with  transformed  likelihood  equations 

(4.9) 

■**  * 
mij+  "  Zlj  + 

i.j  -  1,2,..., 

K 

(4.10) 

m++k  "  z++k 

k  »  1,2,3 

(4.11) 

A*  /V*  /S  *  A* 

mi+l  +  m+i2  +  °i+3  +  m+i3 

*  *  *  * 

"  Zi+1  +  Z+i2  +  zi+3  +  Z+i4 

i  a  1,2,  ... ,  K 


1 
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(4.12) 


a/{  ^  *  *  it 

mi+l  +  ra+i2  +  mi+4  +  m+i4 


*  *  *  * 
zi+l  +  Z+i2  +  Zi+4  +  z+i4 


i  =  1,2, ,  K 


As  the  likelihood  equations  involve  simple  sums  of  cell  counts,  the 
basic  IPFP  may  be  used  for  this  problem.  To  invert  the  transformation 


(4.5) 

-  (4 

.8) 

it 

is  necessary  that  m...  =  ra.._  . 

ij4  ij  3 

Equations 

(4.9) 

and 

(4.12) 

ensure  this.  Thus  the  M.L.E.  in 

is 

(4.13) 

A 

m.  . 
ijl 

SE 

tt  m 

2  xjl 

(4.14) 

A 

m. 

xj2 

\  A.* 

2  raij2 

(4.15) 

“i  j  3 

& 

A* 

mij  3 

m  . , 
ij4 

To  make 

the 

argument 

rigorous 

it  is  necessary  to  show 

that  if  ra.  .. 

ijk 

satisfy 

(4.9)  - 

(«• 

12)  then 

(i) 

A* 

mij3 

- 

m.  •/ 
ij4 

and 


(ii)  defined  by  "  (  3. 15)  satisfy 


(3.2)  -  (  3.4) 


Condition  (i)  has  already  been  mentioned  and  condition  (ii)  is  easily 
verified  by  substitution. 

This  example  has  again  been  a  case  where  the  transformed  table  and 

model  are  in  one  to  one  correspondence  with  the  original  table  and 

model.  The  transformed  model  can  be  fitted  using  the  simple  IPFP  but 

* 

as  the  sufficient  statistics  are  not  only  margins  of  z  ,  many  standard 


computer  packages  would  have  difficulty  with  this  problem. 
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5*  Social  Networks 

In  recent  years  there  has  been  an  increasing  interest  in  models 
for  the  analysis  of  data  from  social  networks.  A  line  of  research 
described  by  Holland  and  Leinhardt  (1981)  and  further  developed  by 
Fienberg  and  Wasserman  (1981)  and  Fienberg,  Meyer  and  Wasserman  (1981) 
has  been  particularly  fruitful. 

The  basic  data  for  these  models  consists  of  observations  on  the 
arcs  of  a  directed  graph  (digraph)  on  g  nodes.  The  nodes,  often 
taken  to  represent  individuals  or  organizations  in  a  community,  are 
called  actors.  The  directed  arcs  linking  the  actors  represent  such 
notions  as  the  attitudes  of  an  individual  toward  another  or  the  flows 
of  resources  between  organizations. 

A  social  network  with  a  single  relationship  connecting  actors  can 
be  described  by  an  adjacency  matrix. 


1  if  actor  i  connects  to  actor  j  (i  +  j) 
0  otherwise 


Holland  and  Leinhardt  (1981)  develop  a  model,  which  they  refer  to  as 
p^  ,  and  several  submodels  for  such  digraph  data.  Fienberg  and 
Wasserman  (1981)  extend  these  models  to  the  case  where  the  actors  form 
disjoint  groups  and  interest  lies  in  the  flows  between  groups. 
Fienberg,  Meyer  and  Wasserman  (1981)  further  extend  these  results  to 
the  situation  where  more  than  one  relationship  is  observed  between  the 
actors  or  groups. 

From  a  computational  point  of  view  all  of  these  models  are 
similar.  For  each  of  them  the  likelihood  function  can  be  viewed 
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as  the  Poisson  likelihood  and  the  models  are  either  loglinear  models  or 
affine  transformations  of  loglinear  models  for  the  mean-value  parameter. 
There  is  a  further  similarity  in  that  for  each  case  a  natural  presenta¬ 
tion  of  the  data  involves  non-rectangular  data  arrays  but  there  exist 
transformations  of  the  data  into  rectangular  structures  for  which  the 
transformed  sufficient  statistics  are  simple  margins.  We  will  consider 
the  simple  version  of  the  problem,  involving  a  single  relationship 
between  actors  and  the  most  general  version,  involving  multiple  rela¬ 
tions  between  groups  of  actors.  For  these  cases  we  will  prove  that 
the  simple  IPFP  can  be  applied  to  the  transformed  data  in  order  to  fit 
the  desired  models  using  the  method  of  maximum  likelihood. 

In  order  to  develop  these  results  we  need  to  consider  the  original 
data  and  distributions.  Our  presentation  will  emphasize  the  mathema¬ 
tical  structure,  ignoring  the  interpretation  of,  and  motivation  for, 
the  models.  We  turn  first  to  a  development  of  the  Holland  and 
Leinhardt  p^  distribution. 

We  consider  the  matrix  X  =  {X^ ,  i  =  j  =  1,2,...,  g}  as  a  random 
matrix  to  which  the  distribution  will  apply.  Consider  the  dyads, 
or  subgraphs,  ,  between  actors  i  and  j  ,  where 

DU  ■  ««  •  v  • 


The  random  variable  has  4  possible  values, 


(1,1)  :  Mutual 

(1,0)  or  (0,1)  :  Asymmetry 

(0,0)  :  Null 
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Under  the  assumption  of  dyadic  independence,  Holland  and  Leinhardt 
(1981)  propose  the  use  of  the  exponential  family  of  distributions, 


{ 


P(X  -  x)  »  exp  (  p  Z  x  x..  +  9  x 


1<J 


ij  Ji 


Z  a.  x  +  Z  6.  x  }  x  K(p,9, {a. }, (8. }) 
^  1  j-'*'  j  2  "t‘j  )  1  J 


i 

Now  consider  the  random  variable  Y  ,  equivalent  to  X  ,  which  is 
defined,  for  i  <  j,  as 


ijll 


ijlO 


ijOl 


ijOO 


X..  •  X..  :  Mutual 
ij  Ji 

X^  •  (1  -  X^)  :  Asymmetric 
(1  -  X„)  •  X^  :  Asymmetric 
(1  -  X^Hl  -  Xjt)  :  Null 


corresponding  to  the  values  of  D  .  Fienberg  and  Wasserman  (1981) 
show  that  in  terms  of  Y  ,  the  log  likelihood  function  for  the  model 
P2  is: 


i(p,e,{ai},{6j}iy) 


“  p  J  J.  yijii  +  6  J  J^ijlO  +  yij01  +  2^11> 

+  i  +  yijll)  +  hZ.(yhi01  +  yhill)  ] 

+  j  6j[  +  yijH)  +  jJh^jhOl  +  yjhll)  ] 
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Now  view  y  as  an  element  of  V  =  {p.m.f;'s  on  the  index  set/C}  where 
K  m  {(i,j,k,£,);  i  <  j  =  1,2,...,  g;  k,2  =  0,1}  .  If  we  consider  y  t< 
be  distributed  as  a  collection  of  independent  Poisson  random  variables 
with  mean  q  e  Y  then  the  likelihood  is  exactly  that  which  would  be 
obtained  by  using  the  loglinear  model 


ln(q)  e  M 


c  R 


K 


The  manifold,  M 


is  spanned  by  the  vectors 


es  €  nK 


P 


6  *=  1,2,,.. 

(5.1)  P 


(5.2)  e 


2  +  2g 


given  by 

r  1  :  (k,£)  *  (1,1) 

0  :  otherwise 

’  2  :  (k, £)  -  (1,1) 

-  1  :  (k,Z)  =  (1,0)  or  (0,1) 
0  :  otherwise 


(5.3) 


1  :  (k,  Z)  =  (1,1)  or  (k,&)  =  (1,0)  and 
J  >  i',  i  -  i*  for  (k,£)  ■=  (1,0)  and 

j  -  i\  i  <  j 


i’ 


1,2, 


,  g 


[  0.  :  otherwise 


(5.4)  Bj*  f 


2+g+j ' 


j'  -  1,2,..,  g 


’  1  :  (k,i)  =  1,1)  or  (k,l)  =  (1,0)  and  j  =  j ’ , 
i  >  j'  or  (k,A)  =  (0,1)  and  i  «  j‘,  j  <  i 

0  :  otherwise 


This  spanning  set  was  chosen  so  that  the  inner  product  of  an  observed 
y  with  the  f's  yields  the  sufficient  statistics: 
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(5.6)  0 


(5.7)  a., 


2+i' 


]  +  yij01  +  ’W 

E  (y,.4m  +  y. <4m>  +  1  +  y^m) 


i’  -  1,2,..,  g 


i  ’  j  10  7 i 1 j  01 


h<i 


.  ,  hi 1 01  hi’ll 


(5.8)  6 y  a 


2+g+j  ’ 


j’  -  1,2,..,  g 


+  yij,n)  +  h^,<yj'>.oi  +  y:i'hn) 


We  now  collect  the  spanning  vectors  into  F  =  {f  :  h  -  1,2,..., 
(2+2g) }  and  the  observed  sufficient  statistics  into 
A  »  {a*1;  h  =  1,2,...,  (2+2g)  }  .  If  we  define  the  linear  space  of 
P.D.'s,  E,  by  the  constraints,  F,  and  corresponding  constants,  A  „ 
then  the  M.L.E.  is 


q  =  *E(r) 


where  r  €  V  and  r  =  c  V  e  K  .  Thus  a  natural  setting 

for  the  estimation  of  p^  is  as  a  loglinear  model  on  V  .  As  the 
2 

vector  f  is  not  a  zero-one  vector,  and  cannot  be  cast  in  this  form. 


the  basic  IPFP  can  not  be  used  for  the  estimation  problem.  In  addition 
for  many  problems  g  will  be  so  large  that  Newton’s  method  can  not  be  used. 
It  would  be  desirable  if  the  problem  could  be  put  in  a  form  where  a 
standard  algorithm  could  be  used. 


The  space  V  is  a  rather  convoluted  construction.  It  would  be 

it  £ 

more  natural  to  work  with  V  =  {p.m.f.’s  on  the  index  set  K  }  where 
K*  **  {(i,j,k,£)  :  i, j  =  1,2,..,,  g  ;  k ,1  ■  0,1}  ,  the  space  of 


g  x  g  x  2  x  2  tables.  To  this  end  consider  the  transformation 
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g 


y  ■*  v  * 


with  y*  =  g(y)  defined  by 


yjUk 


yjyw  1  <  J 

o  i  =  j 


In  other  words  we  have  transformed  the  problem  into  a  g  x  g  x  2  x  2 
contingency  table  with  zeros  on  the  "diagonals" .  The  sufficient 
statistics  (5.5)  -  (5.8)  appear  (sometimes  more  than  once)  as  the 

o 

[12],  [13],  [14],  [23],  [24],  and  [34]  margins  of  y*.  Now  consider 

.  .  r* 

the  linear  space  of  p.m.r.  s,  t  9  defined  by 


F*  m  {[12],  [13],  [14],  [23],  [24],  and  [34]  margin  functions} 


and 

A  »  {[12],  [13],  [14],  [23],  [24],  and  [34]  margins  of  y  }  . 

We  should  note  that  E  is  not  equal  to  g(E  ).  In  fact, 

S<  E)  -  E*  n  ly*jkl  :  y‘jkt  -  y*iik }  . 

In  other  words  g(E;  is  a  strict  subset  of  E  .  As  the  model,  E  , 

requires  just  simple  margins  of  a  rectangular  data  array,  the  basic 

IPFP  found  in  many  computer  packages  can  be  used.  We  would  like  to  be 

*  * 

able  to  fit  just  E  to  y  ,  ignoring  the  symmetry  constraints. 

Let 

q*  =  E*E*(g(r)) 


where 


* 


g(r) 


i  4  5 

i  -  j 


As  q  is  easy  to  calculate  we  would  like  to  assert  that  q  e  g(E)  . 


r 

One  method  of  proceeding  would  be  to  go  ahead  and  fit  c  to  y 
If  q  has  the  desired  symmetries  then  all  is  well.  In  general  we 
need  to  prove  that  for  an  arbitrary  y  »  Q  must  be  in  g(  t)  . 

Our  first  version  of  this  proof  relied  upon  the  actual  calcula¬ 
tions  involved  in  the  IPFP  to  show  the  symmetry.  The  proof  presented 
here  is  much  simpler  and  relies  only  on  an  invariance  argument. 

Let  h  denote  the  mapping  from  R  gxgx2x2  ^nto  p  gxgx2*2  ^ef ined 


by 


-ijk£  ji£k 


A  . 

i.e. , the  symmetry  transformation- In  order  that  q  be  in  g(  F  )  we 
require  that 


h(PE*(g(r)))  =  lP^(g(r))  . 

Now  notice  that 

h([12]  margin  function)  «  [12]  margin  function  , 

h([13]  margin  function)  *  [24]  margin  function  , 

* 

and  that  each  of  the  other  margin  functions  in  F  is  mapped  into 

n  * 

another  margin  function  in  r  .  Similarly 

h( [13]  margin  for  data  y*)  ■  [24]  margin  for  data  y*  . 

In  other  words*  h(  F  *)  *  F*  and  h(  A*)  *  A*  which  together  imply 
that  h(E*)  »  £  .  Also  note  that  h(g(r))  *  g(r).  We  can  then 


assert  that 
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q*  -  P£*(g(r))  =»  ffh(  *)(h(g(r)))  =  h(q*) 

and  hence  the  result. 

We  have  now  shown  that  the  M.L.E.  q  resulting  from  fitting  E 

★  A  — l 

to  y  is  in  g(E  )  and  hence  q  «  g  (q  )  . 

There  are  numerous  submodels  of  considered  by  Holland  and 

Leinhardt  (1981)  and  Fienberg  and  Wasserman  (1981).  These  models, 
represented  in  terms  of  parameters  and  margins  in  the  y*  table 
are  listed  in  Table  5.1. 


Table  5.1  Submodels  of 


Special  Case 

Parameters 

Margins  Fitted 

-  (i) 

[12] 

[13] 

[14] 

[23] 

[24] 

[34] 

(ii) 

8 ,  (a . } ,  { 8 . } 

[12] 

[13] 

[14] 

1  3 

[23] 

[24] 

(iii) 

P,9,{a. } 

[12] 

[13] 

[24] 

X 

[34] 

(iv) 

Q.fc^] 

[12] 

[13] 

[24] 

(v) 

P,e,(8,} 

[12] 

[14] 

[23] 

J 

[34] 

(vi) 

9,(8.} 

[12] 

[14] 

[23] 

(vii) 

P.0 

[12] 

[34] 

(viii) 

e 

[12] 

[3] 

[4] 

Each  of  these  sets  of  margins  are  invariant 

under 

h 

and  the 

argument  is  applicable. 
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For  the  problem  all  of  the  models  in  Table  4.1  can  be  fit 

using  the  basic  I?FP  on  the  data  y". 

Our  second  example  concerns  a  class  of  loglinear  models  for 
multivariate  directed  graphs  as  described  in  Fienberg,  Meyer  and 
Wasserman  (1981).  They  consider  a  set  cf  data  concerning  rhe  inter¬ 
relationships  betveen  73  organizations  in  a  small  community.  Three 
types  of  relationships  were  observed  for  each  of  the  pairs  of  organi¬ 
zations,  but  for  simplicity  we  restrict  our  attention  to  two  of  these 
criteria,  support  and  money.  For  each  criterion  the  organizations  were 
asked  to  respond  to  the  questions: 

(i)  to  which  organizations  do  you  give  support  (money)? 

(ii)  from  which  organizations  do  you  receive  support  (money)? 

A  particular  directed  relationship  (i.e.,  giving  or  receiving)  is 
regarded  to  be  present  if  either  or  both  the  organizations  in  a  pair 
perceived  the  relationship.  For  each  pair  of  organizations  it  is 
possible  to  construct  a  four-vector  of  zeros  and  ones  indicating  the 
presence  or  absence  of  (support  out,  support  in,  money  out,  money  In). 
Consider  for  the  moment  just  the  support  relationship.  A  pair  of 
organizations  are  said  to  have  a  Mutual  rela t ionship  if  they  support 
each  other  (i.e.,  (support  out,  support  in)  =  (1,1))  ,  a  Null  relation¬ 
ship  if  neither  supports  the  other  (i.e.,  (0,0))  ,  or  an  Asymmetric 
relationship  if  support  is  unreciprocated  (i.e.,  (0,1)  or  (1,0))  . 

If  we  aggregate  over  all  ^ =  2628  pairs  of  organizations  there 
are  ten  distinguishable  support-money  relationships ,  namely, 
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F 


MM 

with  four  vector 

U,  l.l.D 

MA 

Cl, 1,0,1) 

or 

(1,1,1, 0) 

MN 

(1, 1,0,0) 

AM 

(0,1, 1,1) 

or 

(1, 0,1,1) 

AA 

(0, 1,0,1) 

or 

(1,0, 1,0) 

AA 

(0,1, 1,0) 

or 

(1,0, 0,1) 

AN 

(0,1, 0,0) 

or 

(1,0, 0,0) 

NM 

(0,0, 1,1) 

NA 

(0,0, 1,0) 

or 

(0,0, 0,1) 

NN 

(0,0, 0,0) 

Notice  that  when  both  relationships  are  asymmetric  there  are  two 
different  cases,  corresponding  to  whether  the  relationships  flow  in 
the  same  or  in  different  ways.  We  denote  the  table  of  observed 
probabilities  by  z  where  for  example  is  the  number  of  mutual- 

mutual  relationships  divided  by  '  The  tat)ie  is  represented  by 


HONEY 


z 


S 

u 

p 

p 

0 

R 

T 


M 

A 

N 

M 

2  MM 

2ma 

ZMN 

A 

ZAM 

2aa 

ZAN 

ZAA 

N 

ZNM 

2na 

ZNN 

Fienberg,  Meyer  and  Wasserman  (1981)  model  the  probability, 
q  *  {q ^  ;  a,b  *  M,A,N}  that  a  randomly  selected  dyad  will  be  assigned 


to  a  certain  cell.  They  consider  linear  models  for 


C  -  «ab  ;  a.b 

~  M,A,N)  where 

* 

’  log(<W 

if 

a,b  each  equal  M  or  N 

-  log(<ab/2) 

if 

either  a  or  b  equals  A 

These  models  are  affine  translations  of  loglinear  models  for  q  ,  The 
arguments  presented  here  apply  to  all  of  their  models. 

The  model  we  consider  takes  as  a  linear  space,  E  ,  of  p.m.f.'s  the 
set  of  tables,  s  ,  which  have  margins  and  »  n,b  »  M,A,N 

which  are  the  same  as  the  corresponding  margins  for  the  z-table.  For 
example  we  require 


SA+  -  S»1  +  SAA  +  SAA  +  SAN  *  2»l  +  ZAA  +  ZAA  +  2AJI  "  2 
In  order  to  have  the  model  be  linear  in  £  ,  we  need 

q  =  IPg  (r) 

where 

1  if  a,b  each  equal  M  or  N 

r‘b  '  h  „ 

,  2  cither  a  or  b  equal  A 


As  the  model  space  can  be  spanned  by  vectors  consisting  of  0Ts  and  l's, 
the  simple  IPFP,  which  takes  an  initial  table,  r,  and  successively 
adjusts  the  row  and  column  "margins"  to  match  those  in  the  observed 
table,  can  be  used.  This  algorithm  is  easy  to  do  by  hand,  but  because 
the  z-table  is  not  rectangular  (i.e.,  it  has  10  cells  rather  than  the 
9  one  would  expect),  and  consequently  has  an  extended  interpretation  of 


margin  totals*  many  standard  IPFP  computer  programs  would  not  be  able  to 
analyze  this  table.  Moreover,  for  many  of  the  models  considered  by 
Fienberg,  Meyer  and  Wasserman  the  models  are  not  so  simple  and  the 
computations  on  the  z-table  require  more  than  the  simple 
IPFP  For  this  reason  we  prefer  to  work  with  a  transformed  problem, 
where  the  sufficient  statistics  for  the  models  can  be  represented  by 
simple  marginal  totals. 

An  alternate,  though  somewhat  deceptive*  description  of  the  data 
is  to  consider  four-vectors  for  each  of  the  ^ x  2  ordered  pairs  of 
organizations  and  to  aggregate  this  into  a  2^  table,  y  -  y^^  > 
i,j,k,l  *  1,2  ,  where  a  1  indicates  the  presence  of  a  flow  and  a  2 
indicates  the  absence  of  a  flow.  Thus  is  the  number  of  mutual- 

mutual  relationships  divided  by  5256.  The  y  table  duplicates  certain 
relationships  and  gives  double  weight  to  certain  others.  The  y- table 
has  the  form, 


supp  out 

1 

2 


money  out 


money  in 

supp  in 

1 

2 

1 

2 

i 

ymi 

ylll2 

y1121 

yll22 

2 

• 

• 

• 

• 

1 

« 

• 

• 

• 

2 

* 

• 

1 

! 

• 

• 

We  now  consider  the  transformation  which  maps  the  z-table  into  the 
y-table;  viz.. 
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8  :  Z  ^2 


2zmm 

ZMA 

ZMA 

2z 

MN 

ZAM 

ZAA 

ZAA 

ZAN 

ZAM 

ZAA 

ZAA 

2  AN 

2znm 

ZNA 

ZNA 

2znn 

1 

2  y 


We  denote  the  factors  support  (out,  in),  money  (out,  in)  by  the 

numbers  1,  2,  3,  and  4.  It  is  now  easy  to  see  that  the  marginal  sums 

considered  for  the  z-table  can  all  be  found  (twice)  in  the  [12]  and 

[34]  margins  of  the  y-table.  Also  note  that  the  y-table  has  a  strong 

symmetry,  y^^jj,  ■  ^  •  Now  g(E  )  is  just  the  set  of  tables 

which  have  (i)  the  correct  [12]  and  [34]  margins  and  (ii)  preserve  the 

observed  symmetry  in  the  y-table.  Consider  just  the  first  of  these 

conditions  ignoring  the  symmetry  constraint.  It  is  this  model  which 

* 

we  shall  consider  to  be  E  .  As  we  have  relaxed  some  conditions  it  is 

* 

clear  that  g(E  )  c:  E  . 

From  here  on  the  argument  proceeds  in  the  same  manner  as  in  the 

single  relationship  case.  It  is  convenient  now  to  explicitly  define 

the  space  E  and  the  conditions  we  need  to  verify  to  show  that 

TP  *(g(r))  is  in  g(E).  Consider 

E 


{f fg)  where 


31 


E 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

i 

1 

1 

1 

0 

0 

0 

1 

o 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

r*  r  • 

and  constants  A  =  {a .  aQ}  where  a.  *  <f.,g(z)>  •  Note  that 

1  o  3  J 

* 

and  a^  «  a^  *  We  define  E  to  be  the  space  of  P.D.'s 
*  * 

defined  by  r  and  A  .  Now  consider  the  symmetry  transformation: 


h  =  ytSki  *  ^iik  • 

For  Pg^(g(r))  to  be  in  g(E  )  we  require 
h(P*(g(r))  -  3Pf*(g(r))  • 

It  is  possible  to  assert  this  because  the  space  E  is  invariant 

under  h.  Specifically  h^)  =  f±  for  i  =  1,4, 5, 8  and  h ( f 2>  **  f^  , 

h(f^)  ■  »  h(f^)  *  fg  and  h(f^)  *  fy  •  Because  ■  a^  and 

a^  *  the  linear  space  h(E*)  generated  by  h(F*)  and  h(A*)  is 
* 

the  same  as  £  .  We  also  note  that  h(g(r))  ■  g(r)  ,  because  of  the 
nature  of  g  function.  That  is  the  starting  values  necessarily  satisfy 
the  symmetry  constraints.  Now  let 
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q*  =  3P^(g(r))  and 

a*  *  IF  .  ^(h(g(r)>)  =  E>  *(c(r))  . 

q  h(£">  E 

A  A 

Bat  note  that  q*  -  h(q*)  as  all  we  have  done  is  relabel  the 
co-ordinates .  Thus 

q*  -  4*  -  h(q*) 

i.e.,  the  fitted  P.D.  is  (i)  invariant  under  h  and  (ii)  is  in  £  *. 
Thus  q*  is  in  g(E)  and  g  (q*)  is  the  fitted  P.D.  in  the  space 
of  Z-tables. 

For  any  of  the  other  models  considered  by  Fienberg,  Meyer  and 

* 

Wasserman,  it  is  easy  to  show  that  the  space,  c  ,  is  invariant  under 
h  and  thus  the  above  argument  still  works. 

In  these  examples,  g(r)  is  the  uniform  distribution;  thus  the 
IPFP  with  starting  value  all  ones  is  an  appropriate  algorithm.  For 
some  of  the  models,  the  appropriate  margins  of  the  y*-table  represent 
a  decomposable  model;  in  fact  the  model  [12],  [34]  is  itself  decom¬ 
posable.  Thus  we  have  not  only  found  an  easy  computational  procedure, 
but  have  also  discovered  closed-form  estimates  for  some  of  the  models. 
The  existence  and  nature  of  closed-form  estimates  varies  with  the 
number  of  relationships  between  actors  which  are  modeled. 

The  analysis  of  the  multiple  relationship  data  that  we  have  considered 
has  been  for  the  data  aggregated  over  all  the  actors.  In  some 
situations  it  may  be  desirable  to  aggregate  over  only  groups  of  actors, 
in  which  case  there  is  a  2*  (or  with  3  relationships,  26)  table  for 


each  group  of  actors.  In  this  manner  it  is  possible  for  the  number  of 
entries  in  the  table,  and  the  number  of  parameters  in  the  corresponding 
models,  to  grow  very  large.  Under  these  circumstances  the  transforma¬ 
tion  techniques  outlines  in  this  chapter  prove  to  be  of  considerable 
practical  use. 

6.  Desiderata 

We  conclude  this  chapter  with  a  few  questions  and  cautions.  The 
examples  have  shown  situations  where,  for  reasons  of  computational  ease, 
it  was  desirable  to  transform  a  contingency  table  into  a  related  but 
larger  table.  In  the  transformed  table  it  was  possible  to  fit  a  model 
using  the  standard  IPFP  whereas  in  the  original  table  the  corresponding 
model  would  have  required  a  more  complicated  algorithm.  This  approach 
of  using  transformed  tables  is  especially  important  in  practice  as 
versions  of  the  standard  IPFP  are  widely  available  and  easy  to  use.  An 
additional  bonus  which  can  sometimes  be  found  in  the  transformed  table 
is  the  existence  of  closed  form  maximum  likelihood  estimates.  The  theory 
about  when  closed  form  estimates  exist  in  complete  tables  with  factorial 
models  is  well  known  and  such  situations  are  easily  recognized.  On  the 
contrary,  when  a  table  is  incomplete  or  has  a  more  complicated  structure, 
very  little  is  known  about  the  existence  of  closed  form  estimates.  Our 
techniques  have  merely  scratched  the  surface  of  the  more  general  question 
of  closed  form  estimates.  A  more  general  theory  of  closed  form  estimates 
for  arbitrary  loglinear  models  would  seem  desirable;  perhaps  investigations 
of  the  more  general  IPFP  will  aid  in  this. 


Throughout  our  discussion  i/e  have  ignored  the  important  questions 
of  degree  of  freedom  calculations  and  asymptotic  covariance  estimates 

A 

for  the  M.L.E.  When  g(E  )  =  E  »  that  is  we  are  essentially  only 
relabeling  the  problem,  then  any  d.f.  and  covariances  calculated  in  E 

_  A 

can  be  transformed  back  to  E  .  When  g(E  )  E  *  special  care  must 

be  taken  to  calculate  the  appropriate  d.f.  in  E  .  We  know  of  no 

A 

exact  procedure  for  transf orming  covariance  estimates  in  E  back  to 
E  and  suspect  that  it  is  not  possible. 
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